The World Bank, as part of its mission, has maintained detailed records concerning the
economic activity of countries around the world for over half a century. In particular, the World Bank keeps records of the
Gross Domestic Product (GDP) by country spanning the years from
1960 to 2021. The World Bank site includes lots of tools for visually analysing this GDP data. This
page at the World Bank data site allows users to plot the yearly
GDP of a particular country in terms of current US dollars.
The raw economic data collected by the World Bank are freely available for download and analysis. In particular, the yearly
data on GDP in current US dollars are available as a comma-separated values (CSV) file.
The scope of this assingment is to carry out wranging and exploratory data analysis on the GDP data using the Python data manipulation and analysis package Pandas with an aim to:
Plot the yearly GDP of a specified list of countries as a line chart (XY plot) using the Python visualisation package Pygal.
Plot the GDP data for a given year on a world map using Pygal. The plot should have a graphical functionality similar to this page at the World Bank data site. The map should depict not only the GDP data, but also the countries which are missing from the GDP data entirely and the countries that are contained within the GDP data, but have no data for the given year.
A CSV file containing GDP data up until the end of 2021 is available here. Note that the first two columns correspond to the Country Name and Country Code for each country in the file. Subsequent fields include GDP data (in current US dollars) for years 1960-2021 inclusive.
The World Bank data set uses the three-letter Country Code as per ISO3166-1-Alpha-3, while the list of countries supported in Pygal uses two-letter country codes as per ISO3166-1-Alpha-2. To reconcile Pygal's country information with the World Bank's country information an additional CSV file from the WorldData site is used. WorldData is a comprehensive database for geodetic, climatological and demographic data. It provides a wide variety of analyses and global comparisons as well as data sheets for each country with additional development data and charts for several areas of expertise. The file, which has been slightly edited for consistency (the second line has been deleted), can be found here.
Load libraries:
import pandas as pd
import numpy as np
import pygal
from pygal_maps_world.maps import World
from pygal_maps_world.maps import COUNTRIES
import os
import pprint
Load GDP data set:
gdp_wide_df = pd.read_csv(r"..\clean_data\gdp_data_clean.csv")
gdp_wide_df
| Country Name | Country Code | Indicator Name | Indicator Code | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Aruba | ABW | GDP (current US$) | NY.GDP.MKTP.CD | NaN | NaN | NaN | NaN | NaN | NaN | ... | 2.615084e+09 | 2.727933e+09 | 2.791061e+09 | 2.963128e+09 | 2.983799e+09 | 3.092179e+09 | 3.202235e+09 | 3.310056e+09 | 2.496648e+09 | NaN |
| 1 | Africa Eastern and Southern | AFE | GDP (current US$) | NY.GDP.MKTP.CD | 2.129059e+10 | 2.180847e+10 | 2.370702e+10 | 2.821004e+10 | 2.611879e+10 | 2.968217e+10 | ... | 9.730430e+11 | 9.839370e+11 | 1.003680e+12 | 9.242530e+11 | 8.823550e+11 | 1.020650e+12 | 9.910220e+11 | 9.975340e+11 | 9.216460e+11 | 1.082100e+12 |
| 2 | Afghanistan | AFG | GDP (current US$) | NY.GDP.MKTP.CD | 5.377778e+08 | 5.488889e+08 | 5.466667e+08 | 7.511112e+08 | 8.000000e+08 | 1.006667e+09 | ... | 1.990732e+10 | 2.014640e+10 | 2.049713e+10 | 1.913421e+10 | 1.811656e+10 | 1.875347e+10 | 1.805323e+10 | 1.879945e+10 | 2.011614e+10 | NaN |
| 3 | Africa Western and Central | AFW | GDP (current US$) | NY.GDP.MKTP.CD | 1.040414e+10 | 1.112789e+10 | 1.194319e+10 | 1.267633e+10 | 1.383837e+10 | 1.486223e+10 | ... | 7.275700e+11 | 8.207930e+11 | 8.649900e+11 | 7.607340e+11 | 6.905460e+11 | 6.837490e+11 | 7.416900e+11 | 7.945430e+11 | 7.844460e+11 | 8.358080e+11 |
| 4 | Angola | AGO | GDP (current US$) | NY.GDP.MKTP.CD | NaN | NaN | NaN | NaN | NaN | NaN | ... | 1.249980e+11 | 1.334020e+11 | 1.372440e+11 | 8.721929e+10 | 4.984049e+10 | 6.897276e+10 | 7.779294e+10 | 6.930910e+10 | 5.361907e+10 | 7.254699e+10 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 261 | Kosovo | XKX | GDP (current US$) | NY.GDP.MKTP.CD | NaN | NaN | NaN | NaN | NaN | NaN | ... | 6.163785e+09 | 6.735731e+09 | 7.074658e+09 | 6.295820e+09 | 6.682833e+09 | 7.180813e+09 | 7.878509e+09 | 7.899879e+09 | 7.716925e+09 | 9.007159e+09 |
| 262 | Yemen, Rep. | YEM | GDP (current US$) | NY.GDP.MKTP.CD | NaN | NaN | NaN | NaN | NaN | NaN | ... | 3.540134e+10 | 4.041524e+10 | 4.322859e+10 | 4.244450e+10 | 3.131783e+10 | 2.684223e+10 | 2.160616e+10 | 2.188761e+10 | 1.884051e+10 | 2.106169e+10 |
| 263 | South Africa | ZAF | GDP (current US$) | NY.GDP.MKTP.CD | 8.748597e+09 | 9.225996e+09 | 9.813996e+09 | 1.085420e+10 | 1.195600e+10 | 1.306899e+10 | ... | 4.344010e+11 | 4.008860e+11 | 3.811990e+11 | 3.467100e+11 | 3.235860e+11 | 3.814490e+11 | 4.048420e+11 | 3.879350e+11 | 3.354420e+11 | 4.199460e+11 |
| 264 | Zambia | ZMB | GDP (current US$) | NY.GDP.MKTP.CD | 7.130000e+08 | 6.962857e+08 | 6.931429e+08 | 7.187143e+08 | 8.394286e+08 | 1.082857e+09 | ... | 2.550306e+10 | 2.803724e+10 | 2.714102e+10 | 2.125122e+10 | 2.095841e+10 | 2.587360e+10 | 2.631159e+10 | 2.330867e+10 | 1.811063e+10 | 2.120306e+10 |
| 265 | Zimbabwe | ZWE | GDP (current US$) | NY.GDP.MKTP.CD | 1.052990e+09 | 1.096647e+09 | 1.117602e+09 | 1.159512e+09 | 1.217138e+09 | 1.311436e+09 | ... | 1.711485e+10 | 1.909102e+10 | 1.949552e+10 | 1.996312e+10 | 2.054868e+10 | 1.758489e+10 | 1.811554e+10 | 1.928429e+10 | 1.805117e+10 | 2.621773e+10 |
266 rows × 66 columns
Load GDP country codes data set:
# Disable the NA filter so that Namibia's country code "NA" is not parsed as a missing value
country_codes_df = pd.read_csv(r"..\clean_data\country_codes_clean.csv",
keep_default_na = False, na_values = "")
country_codes_df
| Country | ISO3166-1-Alpha-2 | ISO3166-1-Alpha-3 | ISO3166-1-numeric | IOC | Fips 10 | License Plate | Domain | |
|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | AF | AFG | 4.0 | AFG | AF | AFG | .af |
| 1 | Åland Islands | AX | ALA | 248.0 | NaN | NaN | AX | .ax |
| 2 | Albania | AL | ALB | 8.0 | ALB | AL | AL | .al |
| 3 | Algeria | DZ | DZA | 12.0 | ALG | AG | DZ | .dz |
| 4 | American Samoa | AS | ASM | 16.0 | ASA | AQ | USA | .as |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 244 | Wallis and Futuna | WF | WLF | 876.0 | NaN | WF | NaN | .wf |
| 245 | Western Sahara | EH | ESH | 732.0 | NaN | WI | WSA | .eh |
| 246 | Yemen | YE | YEM | 887.0 | YEM | YM | YEM | .ye |
| 247 | Zambia | ZM | ZMB | 894.0 | ZAM | ZA | Z | .zm |
| 248 | Zimbabwe | ZW | ZWE | 716.0 | ZIM | ZI | ZW | .zw |
249 rows × 8 columns
gdp_df = gdp_wide_df.melt(
id_vars = ["Country Name", "Country Code", "Indicator Name", "Indicator Code"],
var_name = "year",
value_name = "gdp"
)
gdp_df
| Country Name | Country Code | Indicator Name | Indicator Code | year | gdp | |
|---|---|---|---|---|---|---|
| 0 | Aruba | ABW | GDP (current US$) | NY.GDP.MKTP.CD | 1960 | NaN |
| 1 | Africa Eastern and Southern | AFE | GDP (current US$) | NY.GDP.MKTP.CD | 1960 | 2.129059e+10 |
| 2 | Afghanistan | AFG | GDP (current US$) | NY.GDP.MKTP.CD | 1960 | 5.377778e+08 |
| 3 | Africa Western and Central | AFW | GDP (current US$) | NY.GDP.MKTP.CD | 1960 | 1.040414e+10 |
| 4 | Angola | AGO | GDP (current US$) | NY.GDP.MKTP.CD | 1960 | NaN |
| ... | ... | ... | ... | ... | ... | ... |
| 16487 | Kosovo | XKX | GDP (current US$) | NY.GDP.MKTP.CD | 2021 | 9.007159e+09 |
| 16488 | Yemen, Rep. | YEM | GDP (current US$) | NY.GDP.MKTP.CD | 2021 | 2.106169e+10 |
| 16489 | South Africa | ZAF | GDP (current US$) | NY.GDP.MKTP.CD | 2021 | 4.199460e+11 |
| 16490 | Zambia | ZMB | GDP (current US$) | NY.GDP.MKTP.CD | 2021 | 2.120306e+10 |
| 16491 | Zimbabwe | ZWE | GDP (current US$) | NY.GDP.MKTP.CD | 2021 | 2.621773e+10 |
16492 rows × 6 columns
Clean the variable names and make changes persist.
gdp_df.rename(
columns = {
"Country Name": "country_name",
"Country Code": "country_code",
"Indicator Name": "indicator_name",
"Indicator Code": "indicator_code"
},
inplace = True
)
gdp_df
| country_name | country_code | indicator_name | indicator_code | year | gdp | |
|---|---|---|---|---|---|---|
| 0 | Aruba | ABW | GDP (current US$) | NY.GDP.MKTP.CD | 1960 | NaN |
| 1 | Africa Eastern and Southern | AFE | GDP (current US$) | NY.GDP.MKTP.CD | 1960 | 2.129059e+10 |
| 2 | Afghanistan | AFG | GDP (current US$) | NY.GDP.MKTP.CD | 1960 | 5.377778e+08 |
| 3 | Africa Western and Central | AFW | GDP (current US$) | NY.GDP.MKTP.CD | 1960 | 1.040414e+10 |
| 4 | Angola | AGO | GDP (current US$) | NY.GDP.MKTP.CD | 1960 | NaN |
| ... | ... | ... | ... | ... | ... | ... |
| 16487 | Kosovo | XKX | GDP (current US$) | NY.GDP.MKTP.CD | 2021 | 9.007159e+09 |
| 16488 | Yemen, Rep. | YEM | GDP (current US$) | NY.GDP.MKTP.CD | 2021 | 2.106169e+10 |
| 16489 | South Africa | ZAF | GDP (current US$) | NY.GDP.MKTP.CD | 2021 | 4.199460e+11 |
| 16490 | Zambia | ZMB | GDP (current US$) | NY.GDP.MKTP.CD | 2021 | 2.120306e+10 |
| 16491 | Zimbabwe | ZWE | GDP (current US$) | NY.GDP.MKTP.CD | 2021 | 2.621773e+10 |
16492 rows × 6 columns
View the GDP dataframe information.
gdp_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 16492 entries, 0 to 16491 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 country_name 16492 non-null object 1 country_code 16492 non-null object 2 indicator_name 16492 non-null object 3 indicator_code 16492 non-null object 4 year 16492 non-null object 5 gdp 13118 non-null float64 dtypes: float64(1), object(5) memory usage: 773.2+ KB
Change year type from object to integer.
gdp_df["year"] = gdp_df["year"].astype(int)
Check if there are any missing values.
gdp_df.isna().sum()
country_name 0 country_code 0 indicator_name 0 indicator_code 0 year 0 gdp 3374 dtype: int64
The GDP data set is cleaned up and in a tidy format now.
View the Country code dataframe information.
country_codes_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 249 entries, 0 to 248 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Country 249 non-null object 1 ISO3166-1-Alpha-2 249 non-null object 2 ISO3166-1-Alpha-3 249 non-null object 3 ISO3166-1-numeric 248 non-null float64 4 IOC 205 non-null object 5 Fips 10 245 non-null object 6 License Plate 213 non-null object 7 Domain 249 non-null object dtypes: float64(1), object(7) memory usage: 15.7+ KB
The Country code dataframe is already in a tidy format, only columns ISO3166-1-Alpha-2 and ISO3166-1-Alpha-3 are of interest.
country_codes_trimmed_df = country_codes_df.loc[:, ["ISO3166-1-Alpha-2", "ISO3166-1-Alpha-3"]].copy()
country_codes_trimmed_df
| ISO3166-1-Alpha-2 | ISO3166-1-Alpha-3 | |
|---|---|---|
| 0 | AF | AFG |
| 1 | AX | ALA |
| 2 | AL | ALB |
| 3 | DZ | DZA |
| 4 | AS | ASM |
| ... | ... | ... |
| 244 | WF | WLF |
| 245 | EH | ESH |
| 246 | YE | YEM |
| 247 | ZM | ZMB |
| 248 | ZW | ZWE |
249 rows × 2 columns
Check if there are any missing values.
country_codes_trimmed_df.isna().sum()
ISO3166-1-Alpha-2 0 ISO3166-1-Alpha-3 0 dtype: int64
Map the country codes from country_codes_trimmed_df to the country codes from gdp_df, and rename variable ISO3166-1-Alpha-2.
gdp_join_df = gdp_df \
.reset_index() \
.merge(country_codes_trimmed_df, how = "left", left_on = "country_code", right_on = "ISO3166-1-Alpha-3") \
.drop(columns = ["index", "ISO3166-1-Alpha-3", "indicator_name", "indicator_code"])
gdp_join_df.rename(columns={"ISO3166-1-Alpha-2": "plot_code"}, inplace = True)
gdp_join_df.head()
| country_name | country_code | year | gdp | plot_code | |
|---|---|---|---|---|---|
| 0 | Aruba | ABW | 1960 | NaN | AW |
| 1 | Africa Eastern and Southern | AFE | 1960 | 2.129059e+10 | NaN |
| 2 | Afghanistan | AFG | 1960 | 5.377778e+08 | AF |
| 3 | Africa Western and Central | AFW | 1960 | 1.040414e+10 | NaN |
| 4 | Angola | AGO | 1960 | NaN | AO |
Convert the values of country_code_short column into lowercase to compare with the Pygal country codes.
gdp_join_df["plot_code"] = gdp_join_df["plot_code"].str.lower()
gdp_join_df.head()
| country_name | country_code | year | gdp | plot_code | |
|---|---|---|---|---|---|
| 0 | Aruba | ABW | 1960 | NaN | aw |
| 1 | Africa Eastern and Southern | AFE | 1960 | 2.129059e+10 | NaN |
| 2 | Afghanistan | AFG | 1960 | 5.377778e+08 | af |
| 3 | Africa Western and Central | AFW | 1960 | 1.040414e+10 | NaN |
| 4 | Angola | AGO | 1960 | NaN | ao |
Get the Pygal country code mapping and all World Bank GDP data country codes.
pygal_countries_dict = COUNTRIES
groupby_plot_code = gdp_join_df.groupby("plot_code")
# Compare number of country codes included in Pygal and the World Bank data
{"pygal_country_codes": len(COUNTRIES), "world_bank_country_codes": len(groupby_plot_code)}
{'pygal_country_codes': 184, 'world_bank_country_codes': 216}
The country codes listed in Pygal are less than the codes listed in the World Bank GDP data.
Check which Pygal country codes have no match against the World Bank country codes
pygal_notin_wbd = {}
for country_code_key in pygal_countries_dict.keys():
if country_code_key not in list(groupby_plot_code.groups):
pygal_notin_wbd[country_code_key] = pygal_countries_dict[country_code_key]
pprint.pprint(pygal_notin_wbd)
print("\n")
print("Number of country codes: " + str(len(pygal_notin_wbd )))
{'aq': 'Antarctica',
'eh': 'Western Sahara',
'gf': 'French Guiana',
're': 'Reunion',
'sh': 'Saint Helena, Ascension and Tristan da Cunha',
'tw': 'Taiwan, Province of China',
'va': 'Holy See (Vatican City State)',
'yt': 'Mayotte'}
Number of country codes: 8
Find the countries that are missing from Pygal's list of country codes.
groupby_code_country = gdp_join_df.groupby(["plot_code", "country_name"])
wbd_notin_pygal = {}
for country_code_key in list(groupby_code_country.groups):
if country_code_key[0] not in pygal_countries_dict.keys():
wbd_notin_pygal[country_code_key[0]] = country_code_key[1]
pprint.pprint(wbd_notin_pygal)
print("\n")
print("Number of country codes: " + str(len(wbd_notin_pygal)))
{nan: 'World',
'ag': 'Antigua and Barbuda',
'as': 'American Samoa',
'aw': 'Aruba',
'bb': 'Barbados',
'bm': 'Bermuda',
'bs': 'Bahamas, The',
'cw': 'Curacao',
'dm': 'Dominica',
'fj': 'Fiji',
'fm': 'Micronesia, Fed. Sts.',
'fo': 'Faroe Islands',
'gd': 'Grenada',
'gi': 'Gibraltar',
'im': 'Isle of Man',
'ki': 'Kiribati',
'km': 'Comoros',
'kn': 'St. Kitts and Nevis',
'ky': 'Cayman Islands',
'lc': 'St. Lucia',
'mf': 'St. Martin (French part)',
'mh': 'Marshall Islands',
'mp': 'Northern Mariana Islands',
'nc': 'New Caledonia',
'nr': 'Nauru',
'pf': 'French Polynesia',
'pw': 'Palau',
'qa': 'Qatar',
'sb': 'Solomon Islands',
'ss': 'South Sudan',
'sx': 'Sint Maarten (Dutch part)',
'tc': 'Turks and Caicos Islands',
'to': 'Tonga',
'tt': 'Trinidad and Tobago',
'tv': 'Tuvalu',
'vc': 'St. Vincent and the Grenadines',
'vg': 'British Virgin Islands',
'vi': 'Virgin Islands (U.S.)',
'vu': 'Vanuatu',
'ws': 'Samoa',
'xk': 'Kosovo'}
Number of country codes: 41
Code to enable embedding of interactive visualizations directly into this notebook.
from IPython.display import display, HTML
base_html = """
<!DOCTYPE html>
<html>
<head>
<script type="text/javascript" src="http://kozea.github.com/pygal.js/javascripts/svg.jquery.js"></script>
<script type="text/javascript" src="https://kozea.github.io/pygal.js/2.0.x/pygal-tooltips.min.js""></script>
</head>
<body>
<figure>
{rendered_chart}
</figure>
</body>
</html>
"""
Functions for plotting GDP for a selection of countries:
# the oldest year for which there is data in the World Bank GDP data csv file
START_YEAR = 1960
def build_plot_dict(gdp_df, country_list):
"""
Inputs:
gdp_df - GDP dataframe
country_list - List of strings that are country names
Output:
Returns a dictionary whose keys are the country names in
country_list and whose values are lists of XY plot (line chart)
values computed from the GDP dataframe gdp_df in trillion
US dollars.
Countries from country_list that do not appear in the
GDP dataframe should still be in the output dictionary, but
with an empty XY plot value list.
"""
gdp_plotdata_dict = {}
for country in country_list:
gdp_country_df = gdp_df.loc[gdp_df["country_name"] == country].copy()
gdp_country_list = [gdp/1e12 for gdp in gdp_country_df["gdp"].tolist()]
gdp_plotdata_dict[country] = gdp_country_list
return gdp_plotdata_dict
def render_line_plot(gdp_df, country_list, min_year, max_year):
"""
Inputs:
gdp_df - GDP dataframe
country_list - List of strings that are country names
min_year - GDP plot oldest year
max_year - GDP plot most recent year
Output:
Returns None.
Action:
Creates an SVG image of an XY plot (line chart) for the
GDP data specified by gdp_df for the countries in country_list.
The image is embedded in the notebook and it is also
stored in a file in the outputs folder.
"""
gdp_plotdata_dict = build_plot_dict(gdp_df, country_list)
line_chart = pygal.Line(
x_title = "Year",
y_title = "GDP ($ Trillion)",
x_label_rotation = 300,
dots_size = 1.5,
width = 800,
height=400
)
line_chart.title = "GDP Over Time for Selected Countries (Years " + str(min_year) + " to " + str(max_year) + ")"
line_chart.x_labels = map(str, range(min_year, max_year + 1))
for country in gdp_plotdata_dict.keys():
gdp_country = gdp_plotdata_dict[country]
line_chart.add(country, gdp_country[(min_year - START_YEAR) : (max_year - START_YEAR + 1)])
line_chart.value_formatter = lambda x: "%.2f" % x
line_chart.render_to_file("..\\outputs\\line_chart_" + str(min_year) + "_" + str(max_year)+ ".svg")
display(HTML(base_html.format(rendered_chart=line_chart.render(is_unicode = True))))
Plot the World Bank GDP data (in trillion US dollar) for the G7 countries spanning 1970 to 2015. The output should be an SVG image of the resulting line plot.
# Alternatively, we can use gdp_join_df to get the same plots
render_line_plot(gdp_df, ["Canada", "France", "Germany", "Italy", "Japan", "United Kingdom", "United States"], 1970, 2021)
Functions for creating the choropleth maps:
def reconcile_countries_by_code(gdp_join_df, country_codes_trimmed_df, year):
"""
Inputs:
gdp_join_df - GDP dataframe reconciled with Pygal's country codes
country_codes_trimmed_df - Dataframe that maps World Bank's country information
with Pygal's country information.
year - Integer year for which to create GDP mapping
Output:
A tuple containing a dictionary and two lists. The dictionary maps
country codes from plot_countries to the log (base 10) of the GDP
value for that country in the specified year. The first list
contains the country codes from countries which are entirely missing
from the World Bank GDP data. The second list contains country codes
from countries lacking a GDP for the specified year.
"""
gdp_year_df = gdp_join_df.loc[gdp_join_df["year"] == year].copy()
# Find all complete rows and make a dictionary of countries with gdp
gdp_complete = gdp_year_df.loc[~ gdp_year_df.isna().any(axis='columns')].copy()
gdp_complete_dict = dict(zip(gdp_complete["plot_code"], np.log10(gdp_complete["gdp"])))
# Get list of countries with a GDP for the specified year
codes_gdp_list = gdp_complete["plot_code"].tolist()
# Get list of countries lacking a GDP for the specified year and remove NaN values from list
codes_no_gdp_full = gdp_year_df.loc[gdp_year_df["gdp"].isna()].copy()
codes_nogdp_list = \
[x for x in codes_no_gdp_full["plot_code"].tolist() if pd.isnull(x) == False]
# Get list of all country codes and remove NaN values from list and lowecase all items
codes_trimmed_list = \
[x.lower() for x in country_codes_trimmed_df["ISO3166-1-Alpha-2"].tolist() if pd.isnull(x) == False]
# Get countries not listed in World Bank Data
codes_dummy = list(set(codes_trimmed_list).difference(set(codes_gdp_list)))
codes_wbd_missing_list = list(set(codes_dummy).difference(set(codes_nogdp_list)))
return gdp_complete_dict, codes_wbd_missing_list, codes_nogdp_list
def render_world_map(gdp_join_df, country_codes_trimmed_df, year):
"""
Inputs:
gdp_join_df - GDP dataframe reconciled with Pygal's country codes
country_codes_trimmed_df - Dataframe that maps World Bank's country information
with Pygal's country information.
year - Integer year for which to create GDP mapping
Output:
Returns None.
Action:
Creates a world map plot of the GDP data (choropleth map). The image
is embedded in the notebook and it is also stored in a file in the
outputs folder.
"""
gdp_complete_dict, codes_wbd_missing_list, codes_nogdp_list = \
reconcile_countries_by_code(gdp_join_df, country_codes_trimmed_df, year)
worldmap_chart = World(width = 800, height=400, legend_at_bottom = True, legend_at_bottom_columns = 3)
worldmap_chart.title = "GDP by Country for " + str(year) + " (log10 scale)"
worldmap_chart.add("GDP for " + str(year), gdp_complete_dict)
worldmap_chart.add("Missing from World Bank Data", codes_wbd_missing_list)
worldmap_chart.add("No GDP Data", codes_nogdp_list)
worldmap_chart.value_formatter = lambda x: "%.2f" % x
worldmap_chart.render_to_file("..\\outputs\\map_file_" + str(year) + ".svg")
# worldmap_chart.render_in_browser()
display(HTML(base_html.format(rendered_chart=worldmap_chart.render(is_unicode=True))))
Plot the logarithmically-scaled (base 10) GDP data on a world map for a selection of years (1960, 1970, 2000 and 2015).
Each world map has three different coloured countries:
# Year 1960
render_world_map(gdp_join_df, country_codes_trimmed_df, 1960)
# Year 1970
render_world_map(gdp_join_df, country_codes_trimmed_df, 1970)
# Year 2000
render_world_map(gdp_join_df, country_codes_trimmed_df, 2000)
# Year 2021
render_world_map(gdp_join_df, country_codes_trimmed_df, 2021)
The main insights drawn from the analysis of the GDP over time plots are:
The main insights drawn from the analysis of world map GDP plots are:
Pygal that have no match against the World Bank GDP data country codes - these are marked in blue on the world map and are listed below:Pygal country codes. These codes correspond to countries (e.g. Qatar, Faroe Islands, South Sudan, etc), overseas territories or regions of countries. For some of these country codes, GDP data have not been recorded by the World Bank.